Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix several arm7l portability issues #1023

Merged
merged 13 commits into from Apr 3, 2017
Merged

Conversation

garlick
Copy link
Member

@garlick garlick commented Apr 2, 2017

This PR gets flux-core working on the 32-bit arm7l architecture used in the Raspberry PI and other embedded platforms, e.g.

  • fix 64 bit values used as printf-style arguments with ambiguous format specifier (e.g. %lu)
  • relax some arbitrary test timeouts
  • fix 64-bit pointer assumption in makecontext() usage
  • reduce VM footprint of test that tried to malloc 20gb of memory

There are a few unrelated minor bug fixes here as well:

  • fix NULL deref in content-cache error path
  • fix missing newlines in log output
  • get working with the latest zeromq (4.2.1)

Construct makecontext() arugments properly on systems
like arm7l that have 32 bit pointers.
Use %zu format specifier to refer to a size_t argument
to avoid compilation errors on arm7l.
Use %PRIi64 not %ld to refer to int64_t argument to avoid
compilation errors on arm7l.
Problem: running 10,000 coprocs causes coproc_create()
uses 10000*2mb = 20gb of VM, resulting in ENOMEM on
a system with 32-bit pointers.

The requested amount of memory cannot be represented in
32-bit pointers.  In addition, the default vm.overcommit_memory
setting (0: heuristic overcommit handling) may reject an
allocation greatly in excess of physical memory.
Run 500 instead of 10K coprocesses, which is still
a reasonable test.

In addition, clean up the test so that it doesn't produce
a series of cascading segfaults and obscure failures when
this happens.  Fail the first test and continue on with
however many coprocs were able to be allocated.
Problem: broker attr test fails on arm7l architecture.

Change strtol() to strtoul() when converting UINT_MAX - 1.
Backport of grondo/lua-affinity#2 to copy of lua-affinity in
flux-core.

Fixes flux-framework#975
Problem: flux-ping segfaults in flux_rpcf_multi() on arm7l,
which has 32-bit struct timeval members.

The "I" format specifier refers to a 64-bit argument in
json_vpack_ex() and passing 32 bit arguments triggers a segfault.
The segfault is avoided in flux-ping by casting tv_sec and tv_usec
arguments to uint64_t.
Problem: 5s timeouts are too tight on tests 1 and 6 on
are too small for some systems.

Actual runtimes on raspberry pi were 5.7s and 8.4s,
respectively.  Increase timeouts to 10s and 20s.
Problem: wlog_fatal() sends log messages to flux_log() or
vfprintf (stderr, ...) depending on context.  The former
takes messages without newlines; the latter requires a newline.

Append a newline character to the stderr stream when that
mode is selected.  Drop trailing newlines from a couple
of wlog_fatal() calls to be consistent with the majority
of other calls.
Problem: zeromq-4.2.1 reports EHOSTUNREACH as "Host unreachable",
but "No route to host" is canonical on Linux and we have some
tests that depend on it, so remap here.
Problem: zeromq-4.2.1 reports EHOSTUNREACH as "Host unreachable",
but "No route to host" is canonical on Linux and we have some
tests that depend on it, so remap here.
Problem: on arm7l architecture, wrexecd behaves erratically.

Add printf-style format attributes to internal varargs debugging
functions to enable type checking, then fix a couple places where
the format specifier (%lu) implied 32-bit on this architecture,
but the argument was 64 bit.  Also fix non-const format specifier
warnings.
Due to a typo in which variables were interchanged, an error
handling code path in cache_store_continuation() could trigger
a SEGFAULT on arm7l test system.

Fix the typo.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.008%) to 78.194% when pulling 62fdb65 on garlick:arm7l into 56b4c56 on flux-framework:master.

@codecov-io
Copy link

codecov-io commented Apr 2, 2017

Codecov Report

Merging #1023 into master will decrease coverage by 0.14%.
The diff coverage is 78.94%.

@@            Coverage Diff             @@
##           master    #1023      +/-   ##
==========================================
- Coverage   78.01%   77.87%   -0.15%     
==========================================
  Files         130      150      +20     
  Lines       23912    25491    +1579     
==========================================
+ Hits        18655    19851    +1196     
- Misses       5257     5640     +383
Impacted Files Coverage Δ
src/bindings/lua/lua-affinity/lua-affinity.c 97.5% <ø> (ø) ⬆️
src/common/libutil/log.c 86.53% <100%> (ø)
src/common/libutil/coproc.c 87.87% <100%> (+0.18%) ⬆️
src/broker/content-cache.c 73.03% <100%> (ø) ⬆️
src/common/libflux/flog.c 95.32% <100%> (+0.04%) ⬆️
src/cmd/flux-jstat.c 85.32% <100%> (ø) ⬆️
src/cmd/flux-ping.c 87.4% <100%> (+0.18%) ⬆️
src/modules/wreck/wrexecd.c 75.66% <55.55%> (-0.07%) ⬇️
src/cmd/builtin/nodeset.c 5% <0%> (-72.35%) ⬇️
src/common/libcompat/info.c 0% <0%> (-67.57%) ⬇️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56b4c56...62fdb65. Read the comment docs.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0005%) to 78.186% when pulling 62fdb65 on garlick:arm7l into 56b4c56 on flux-framework:master.

@grondo grondo merged commit a028c54 into flux-framework:master Apr 3, 2017
@garlick garlick deleted the arm7l branch April 4, 2017 16:06
@grondo grondo mentioned this pull request Aug 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants